Spearman Rank Correlation Test (Spearman’s ρ)#

Spearman’s rank correlation answers a simple question:

“As X increases, does Y tend to increase or decrease (not necessarily linearly)?”

It is a nonparametric measure of association because it works on ranks instead of raw values.

Common use-cases:

  • ordinal data (Likert scales, rankings)

  • monotonic but nonlinear relationships (e.g., exponential, saturating curves)

  • data with outliers where Pearson’s correlation is overly sensitive


Learning goals#

By the end you can:

  • explain what Spearman’s ρ measures (and what it does not measure)

  • compute ρ from first principles using only NumPy

  • perform a hypothesis test via a permutation test (p-value without distribution tables)

  • interpret ρ, p-values, and (bootstrap) confidence intervals

  • visualize monotonic vs non-monotonic relationships with Plotly

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)

try:
    from scipy.stats import spearmanr as scipy_spearmanr
except Exception:
    scipy_spearmanr = None

1) When should you use Spearman?#

Spearman is a good default when you care about monotonic association:

  • Pearson (r): “How well does a straight line explain the relationship?”

  • Spearman (ρ): “How well does a monotonic curve explain the relationship?”

Quick intuition:

  • If you can sort by X and Y tends to move in one direction (up or down), Spearman tends to be large in magnitude.

  • If Y goes up then down (U-shape / sinusoidal), Spearman can be near 0 even though there is a strong relationship.

A rough decision table:

Situation

Prefer

Linear relationship, roughly symmetric noise

Pearson

Monotonic but nonlinear relationship

Spearman

Ordinal measurements (ranks, Likert scales)

Spearman

Lots of ties (discrete/rounded data)

Spearman works, but power drops; consider Kendall’s τ

Assumptions (what the test needs)#

For the usual interpretation, you generally want:

  • paired observations (xᵢ, yᵢ) from the same unit

  • independence across units (no repeated-measures without modeling that)

  • a meaningful ordering of values (ordinal or continuous)

No normality assumption is required for Spearman’s ρ itself.

2) What Spearman’s ρ actually is#

Step 1: replace values by ranks.

If

  • x = [10, 20, 20, 40]

then

  • rank(x) = [1, 2.5, 2.5, 4] (ties get the average rank)

Step 2: compute Pearson correlation on those ranks:

[ \rho = \mathrm{corr}(\mathrm{rank}(x), \mathrm{rank}(y)) ]

Special case (no ties): you’ll often see

[ \rho = 1 - \frac{6\sum_i d_i^2}{n(n^2 - 1)} ]

where (d_i = \mathrm{rank}(x_i) - \mathrm{rank}(y_i)).

def drop_nan_pairs(x, y):
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    if x.shape != y.shape:
        raise ValueError("x and y must have the same shape")

    mask = np.isfinite(x) & np.isfinite(y)
    return x[mask], y[mask]


def rankdata_average(x):
    x = np.asarray(x)
    if x.ndim != 1:
        raise ValueError("rankdata_average expects a 1D array")

    n = x.size
    order = np.argsort(x, kind="mergesort")
    sorted_x = x[order]

    ranks = np.empty(n, dtype=float)

    i = 0
    while i < n:
        j = i
        while j + 1 < n and sorted_x[j + 1] == sorted_x[i]:
            j += 1

        avg_rank = 0.5 * (i + j) + 1.0
        ranks[order[i : j + 1]] = avg_rank
        i = j + 1

    return ranks


def pearson_r(x, y):
    x, y = drop_nan_pairs(x, y)
    if x.size < 2:
        raise ValueError("Need at least 2 paired observations")

    xc = x - x.mean()
    yc = y - y.mean()
    denom = np.sqrt(np.sum(xc**2) * np.sum(yc**2))
    if denom == 0:
        return np.nan

    return float(np.sum(xc * yc) / denom)


def spearman_rho(x, y):
    x, y = drop_nan_pairs(x, y)
    rx = rankdata_average(x)
    ry = rankdata_average(y)
    return pearson_r(rx, ry)


x_demo = np.array([10, 20, 20, 40])
rankdata_average(x_demo)
array([1. , 2.5, 2.5, 4. ])
# Tie-free identity check: corr(ranks) == 1 - 6*sum(d^2)/(n*(n^2-1))
x = np.array([3, 1, 4, 2, 5])
y = np.array([30, 10, 40, 20, 60])

rx = rankdata_average(x)
ry = rankdata_average(y)

d2 = np.sum((rx - ry) ** 2)
n = x.size

rho_corr = spearman_rho(x, y)
rho_d2 = 1 - (6 * d2) / (n * (n**2 - 1))

rho_corr, rho_d2
(1.0, 1.0)

3) How to interpret ρ#

  • Sign: direction of monotonic association

    • ρ > 0: larger X tends to come with larger Y

    • ρ < 0: larger X tends to come with smaller Y

  • Magnitude: strength of monotonic ordering agreement

    • |ρ| = 1: perfect monotonic relationship (ranks are perfectly aligned or perfectly reversed)

    • ρ ≈ 0: no monotonic association (but there may still be a strong non-monotonic relationship!)

Effect size is context-dependent, but a rough mental scale sometimes used:

  • |ρ| ~ 0.1 small, ~0.3 medium, ~0.5+ large

Always remember: correlation describes association, not causation.

4) The hypothesis test#

We test whether the monotonic association could plausibly be 0 in the population.

Typical hypotheses:

  • H₀: ρ = 0 (no monotonic association)

  • H₁: ρ ≠ 0 (two-sided), or ρ > 0 / ρ < 0 (one-sided)

A p-value answers:

“If there were no monotonic association, how often would we see a Spearman ρ at least this extreme just by chance?”

Why a permutation test?#

If H₀ is true, the pairing between X and Y is arbitrary. So we can:

  1. keep X fixed

  2. randomly permute Y many times

  3. recompute ρ each time

The permutation distribution tells us how extreme our observed ρ is under H₀. This gives an (approximate) p-value without relying on normality assumptions.

def spearman_permutation_test(
    x,
    y,
    n_resamples=5000,
    alternative="two-sided",
    seed=0,
    return_null=False,
):
    x, y = drop_nan_pairs(x, y)
    n = x.size
    if n < 2:
        raise ValueError("Need at least 2 paired observations")

    rx = rankdata_average(x)
    ry = rankdata_average(y)

    rx_c = rx - rx.mean()
    ry_c = ry - ry.mean()

    denom = np.sqrt(np.sum(rx_c**2) * np.sum(ry_c**2))
    if denom == 0:
        raise ValueError("Spearman correlation is undefined when ranks have zero variance")

    rho_obs = float(np.sum(rx_c * ry_c) / denom)

    rng_local = np.random.default_rng(seed)
    rhos = np.empty(n_resamples, dtype=float)

    for i in range(n_resamples):
        ry_perm = rng_local.permutation(ry_c)
        rhos[i] = np.sum(rx_c * ry_perm) / denom

    if alternative == "two-sided":
        extreme = np.abs(rhos) >= abs(rho_obs)
    elif alternative == "greater":
        extreme = rhos >= rho_obs
    elif alternative == "less":
        extreme = rhos <= rho_obs
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")

    p_value = float((extreme.sum() + 1) / (n_resamples + 1))

    if return_null:
        return rho_obs, p_value, rhos
    return rho_obs, p_value

Confidence intervals (optional but useful)#

A p-value answers “is there evidence of association?” A confidence interval answers “how big might the association be?”

We can approximate a CI for ρ using the bootstrap:

  • resample paired observations (xᵢ, yᵢ) with replacement

  • compute ρ for each resample

  • take percentiles

def spearman_bootstrap_ci(
    x,
    y,
    n_resamples=3000,
    ci=0.95,
    seed=0,
    return_samples=False,
):
    x, y = drop_nan_pairs(x, y)
    n = x.size
    if n < 2:
        raise ValueError("Need at least 2 paired observations")

    rng_local = np.random.default_rng(seed)
    rhos = np.empty(n_resamples, dtype=float)

    for i in range(n_resamples):
        idx = rng_local.integers(0, n, size=n)
        rhos[i] = spearman_rho(x[idx], y[idx])

    alpha = (1 - ci) / 2
    lo, hi = np.quantile(rhos, [alpha, 1 - alpha])

    if return_samples:
        return float(lo), float(hi), rhos
    return float(lo), float(hi)

5) Visual intuition: monotonic vs non-monotonic#

We’ll compare Pearson vs Spearman on four toy datasets:

  1. linear

  2. monotonic but nonlinear

  3. linear + outliers

  4. non-monotonic (U-shape)

The goal is to build intuition for what changes when you switch from raw values to ranks.

n = 80

x1 = rng.uniform(0, 10, size=n)
y1 = 2 * x1 + rng.normal(0, 2, size=n)

x2 = rng.uniform(0, 5, size=n)
y2 = x2**2 + rng.normal(0, 1, size=n)

x3 = x1.copy()
y3 = y1.copy()
out_idx = rng.choice(n, size=4, replace=False)
y3[out_idx] += rng.normal(0, 30, size=4)

x4 = rng.uniform(0, 10, size=n)
y4 = (x4 - 5) ** 2 + rng.normal(0, 2, size=n)

datasets = [
    ("Linear", x1, y1),
    ("Monotonic nonlinear", x2, y2),
    ("Linear + outliers", x3, y3),
    ("Non-monotonic (U-shape)", x4, y4),
]

fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=[name for name, _, _ in datasets],
)

for i, (name, x, y) in enumerate(datasets):
    row = 1 if i < 2 else 2
    col = (i + 1) if i < 2 else (i - 1)

    fig.add_trace(
        go.Scatter(
            x=x,
            y=y,
            mode="markers",
            marker=dict(size=7, opacity=0.75),
            showlegend=False,
        ),
        row=row,
        col=col,
    )

    pear = pearson_r(x, y)
    spear = spearman_rho(x, y)

    x_annot = float(np.quantile(x, 0.05))
    y_annot = float(np.quantile(y, 0.95))

    fig.add_annotation(
        row=row,
        col=col,
        x=x_annot,
        y=y_annot,
        text=f"Pearson r = {pear:.3f}<br>Spearman ρ = {spear:.3f}",
        showarrow=False,
        align="left",
        bgcolor="rgba(255,255,255,0.7)",
        bordercolor="gray",
        borderwidth=1,
    )

    fig.update_xaxes(title_text="x", row=row, col=col)
    fig.update_yaxes(title_text="y", row=row, col=col)

fig.update_layout(
    height=650,
    width=1000,
    title_text="Pearson vs Spearman: what changes?",
    margin=dict(t=80),
)

fig

The rank view (why Spearman works for monotonic curves)#

Spearman ignores the actual spacing between values and keeps only the order. Plotting rank(x) vs rank(y) turns a monotonic relationship into something close to a straight line.

x = x2
y = y2

rx = rankdata_average(x)
ry = rankdata_average(y)

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=["Raw values", "Ranks (Spearman works here)"]
)

fig.add_trace(
    go.Scatter(x=x, y=y, mode="markers", marker=dict(size=7, opacity=0.75), showlegend=False),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=rx,
        y=ry,
        mode="markers",
        marker=dict(size=7, opacity=0.75),
        showlegend=False,
    ),
    row=1,
    col=2,
)

fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_yaxes(title_text="y", row=1, col=1)

fig.update_xaxes(title_text="rank(x)", row=1, col=2)
fig.update_yaxes(title_text="rank(y)", row=1, col=2)

fig.update_layout(height=420, width=1000, title_text="Same data, different view")
fig

Key property: invariance to monotonic transformations#

Because Spearman uses ranks, any strictly increasing transform leaves the ordering unchanged:

  • ρ(x, y) = ρ(f(x), y) = ρ(x, g(y)) = ρ(f(x), g(y))

Pearson correlation generally changes under nonlinear transforms because it cares about the spacing between values.

x = x2
y = y2

x_t = x**3

y_pos = y - y.min() + 1
y_t = np.log(y_pos)

pear_orig = pearson_r(x, y)
pear_trans = pearson_r(x_t, y_t)

spear_orig = spearman_rho(x, y)
spear_trans = spearman_rho(x_t, y_t)

print(f"Pearson (original)     = {pear_orig:.4f}")
print(f"Pearson (transformed)  = {pear_trans:.4f}")
print(f"Spearman (original)    = {spear_orig:.4f}")
print(f"Spearman (transformed) = {spear_trans:.4f}")

fig = px.bar(
    {
        "metric": ["Pearson", "Spearman", "Pearson", "Spearman"],
        "version": ["original", "original", "transformed", "transformed"],
        "value": [pear_orig, spear_orig, pear_trans, spear_trans],
    },
    x="metric",
    y="value",
    color="version",
    barmode="group",
    title="Spearman is invariant to monotonic transforms",
)
fig.update_yaxes(range=[-1, 1])
fig
Pearson (original)     = 0.9582
Pearson (transformed)  = 0.8908
Spearman (original)    = 0.9550
Spearman (transformed) = 0.9550

6) Running the Spearman hypothesis test end-to-end#

We’ll use the monotonic nonlinear dataset and compute:

  • observed ρ

  • a permutation-based p-value (two-sided)

  • a bootstrap 95% CI

x = x2
y = y2

rho_obs, p_value, rho_null = spearman_permutation_test(
    x,
    y,
    n_resamples=5000,
    alternative="two-sided",
    seed=1,
    return_null=True,
)

ci_lo, ci_hi, rho_boot = spearman_bootstrap_ci(
    x,
    y,
    n_resamples=3000,
    ci=0.95,
    seed=2,
    return_samples=True,
)

print(
    f"n={x.size} | Spearman ρ={rho_obs:.3f} | permutation p={p_value:.4f} | "
    f"bootstrap 95% CI=({ci_lo:.3f}, {ci_hi:.3f})"
)

if scipy_spearmanr is not None:
    rho_scipy, p_scipy = scipy_spearmanr(x, y)
    print(f"SciPy spearmanr: ρ={rho_scipy:.3f} | p={p_scipy:.4f}")
n=80 | Spearman ρ=0.955 | permutation p=0.0002 | bootstrap 95% CI=(0.902, 0.980)
SciPy spearmanr: ρ=0.955 | p=0.0000

Visualizing the permutation null distribution#

Under H₀, ρ should bounce around 0. The p-value is the fraction of permuted correlations at least as extreme as the observed one.

fig = px.histogram(
    x=rho_null,
    nbins=50,
    title="Permutation null distribution of Spearman ρ (H₀: no association)",
    labels={"x": "ρ (permuted)"},
)

fig.add_vline(
    x=rho_obs,
    line_color="crimson",
    line_width=3,
    annotation_text=f"observed ρ={rho_obs:.3f}",
    annotation_position="top right",
)

fig

Visualizing uncertainty with the bootstrap#

Bootstrap samples give an intuitive picture of how much ρ would vary if we repeated the study.

fig = px.histogram(
    x=rho_boot,
    nbins=50,
    title="Bootstrap distribution of Spearman ρ",
    labels={"x": "ρ (bootstrap)"},
)

fig.add_vline(x=ci_lo, line_color="gray", line_dash="dash", annotation_text=f"{ci_lo:.3f}")
fig.add_vline(x=ci_hi, line_color="gray", line_dash="dash", annotation_text=f"{ci_hi:.3f}")
fig.add_vline(
    x=rho_obs,
    line_color="crimson",
    line_width=3,
    annotation_text=f"observed {rho_obs:.3f}",
    annotation_position="top right",
)

fig

7) Interpreting the result (what it means)#

Suppose you got:

  • ρ = 0.62

  • p = 0.003 (two-sided)

  • 95% CI: (0.35, 0.80)

Interpretation:

  • Direction: positive monotonic association (higher X tends to come with higher Y)

  • Strength: moderately strong ordering agreement

  • Evidence: under H₀ of no monotonic association, such an extreme ρ would be rare (p small)

  • Uncertainty: plausible population ρ values are roughly in the CI range

Important caveats:

  • correlation ≠ causation

  • ρ near 0 does not rule out non-monotonic relationships

  • with very large n, tiny |ρ| can still produce tiny p-values (look at effect size + CI)

8) Pitfalls & diagnostics#

  • Always plot the data (Spearman can miss U-shapes).

  • Many ties (discrete/rounded data) reduce power; Kendall’s τ is another option.

  • Outliers affect Spearman less than Pearson, but can still distort the story if they change the ordering.

  • Non-independence (time series, repeated measures) invalidates the usual p-value; you need a model that respects dependence.

  • Multiple testing: if you compute many correlations, adjust p-values or control FDR.

9) Exercises#

  1. Implement another tie method (e.g., "min" ranking) and compare results.

  2. Create a dataset with a clear U-shape and confirm ρ ≈ 0 even though the relationship is strong.

  3. Increase n and reduce the noise; watch how p-values shrink even when ρ changes only a little.

References#

  • Spearman, C. (1904). The proof and measurement of association between two things.

  • SciPy: scipy.stats.spearmanr